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ABSTRACT 

In this paper we describe the spherical harmonic transit telescope, a novel formalism for the analysis 
of transit radio telescopes. This all- sky approach bypasses the curved sky complications of traditional 
inter ferometry and so is particularly well suited to the analysis of wide-field radio interferometers. It 
enables compact and computationally efficient representations of the data and its statistics that allow 
new ways of approaching important problems like mapmaking and foreground removal. In particular, 
we show how it enables the use of the Karhunen-Loeve transform as a highly effective foreground filter, 
suppressing realistic foreground residuals for our fiducial example by at least a factor twenty below 
the 21 cm signal even in highly contaminated regions of the sky. This is despite the presence of the 
angle- frequency mode mixing inherent in real- world instruments with frequency-dependent beams. We 
show, using Fisher forecasting, that foreground cleaning has little effect on power spectrum constraints 
compared to hypothetical foreground- free measurements. Beyond providing a natural real- world data 
analysis framework for 21 cm telescopes now under construction and future experiments, this formalism 
allows accurate power spectrum forecasts to be made that include the interplay of design constraints 
and realistic experimental systematics with twenty-ffist century 21 cm science. 



1. INTRODUCTION 

Mapping the Universe with the 21cm line of neutral 
hydrogen will revolutionise our view of the Universe. It 
holds the promise of unravelling the mysteries of dark en- 
ergy dChang et al.|2QQ8||Loeb"fc Wyithe 2Q Q8|), unveilin g 
the epoch of reionisation (EoK) ( [Furlan etto et al.||2QQ6 ), 
and perhaps even extending our view of the cosmos out 
far enough to shine hght on the primordial dark ages 
(Loeb & Zaldarriaga 2004). Rapidly probing large vol- 
umes of the Universe requires new large wide-field tele- 
scopes along with powerful new digital processing hard- 
ware such as GMR#1 LOFARR MWAPJ Omniscope^ 
PAPERR BA0BA# BAORadi^BINGlQ CHIMeQ 
EMBRACE/EMM and Tianla^ In recent years it 
has become increasingly clear that new methods of in- 
terpreting and analysing the data from these revolution- 
ary new instruments will be necessary to realize their 
scient ific potential f Myers et al. 2003; Tegmark & Zal- 
daxriaga 2009; Parsons & Backer 2009; Liu et al. 2010; 
Liu TeRmark|r20Tll [Parsons et al.||2012| |Dillon et al.| 
20T2f^ 

"We describe here the spherical harmonic transit tele- 
scope, a new paradigm for analysing wide-field tran- 
sit telescopes in the spherical harmonic domain that is 
naturally suited to mapping the 21 cm Universe. Any 
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telescope with fixed pointing observes the sky transit 
through its field of view. The rotation about the poles 
periodically over the course of a sidereal day creates a 
linear correspondence between time t and azimuthal an- 
gle (j). We obtain a simple mapping between the observed 
data and a linear combination of the spherical harmonic 
coefficients aim of the sky at fixed angular wavenumber 
m, mediated by the angular response of each element. In 
what follows, we elaborate and make precise this basic 
idea in the context of wide- field interferometers, includ- 
ing the radial (frequency) direction. 

This formalism diverges sharply with traditional char- 
acterizations of radio interferometry that are better 
suited to observations with a narrow field of view, of- 
ten assuming tracking of a particular source of interest, 
and exploit the Fourier transform mapping between the 
sky and the ixv-plane. What we describe here is an all 
sky formalism for describing interferometry that natu- 
rally incorporates the observable modes on the sky — 
the spherical harmonics. 

The foremost challenge for any 21 cm mapping exper- 
iment is separating the cosmological signal from astro- 
hysical contamin ants which are 10^-10^ times larger 
Furlanetto et al.||2006[ [Morales fc Wyithe||2010[ ). Con- 
ceptually this is simple — the primary foreground sources 
(diffuse synchrotron emission from the Galaxy and emis- 
sion from extragalactic point sources) are smooth as a 
function of frequency, while the 21 cm signal decorrelates 
quickly as each frequency corresponds to a different ra- 
dial slice of the Universe. To remove foregrounds one 
just needs to model and remove the smooth frequency 
component from their observations. Unfortunately, in 
practice, the large dynamic range between the ampli- 
tude of the foregrounds and the 21 cm signal makes sev- 
eral real-world effects extremely problematic. While the 
properties of the cosmological 21cm signal are thought 
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to be well understood, the astrophysical foregrounds are 
poorly constrained at the small angular and frequency 
scales that will be probed by forthcoming 21 cm experi- 
ments — a successful technique should be robust to un- 
certainties in foreground modelling. Of course, these ex- 
periments will themselves help characterize the proper- 
ties of real- world foregrounds. More troublesome is the 
phenomena of angular-frequency mode mixing: in any 
real experiment the shape of the beam on the sk y will 
vary with the observed frequency ( jLiu et aLpOO Q*). This 
mode mixing makes simple frequency only foreground 
removal methods ineffective in practice. We show below 
that the spherical harmonic transit telescope formalism 
can naturally address the issues of model uncertainty and 
mode mixing, and enable efficient and effective discrimi- 
nation of the 21cm signal from obscuring foregrounds. 

Any foreground removal method aims to find a subset 
of the data within which there is significantly more 21 cm 
signal than astrophysical foregrounds. However, in the 
presence of mode-mixing, it is not obvious how to select 
a basis which separates the two components — what we 
would like is a method which can automatically gener- 
ate it. Just such a technique exists in the form of the 
Karhunen-Loeve (KL) transform. In this paper we show 
how the m-mode formalism, described hence, makes the 
use of the KL transform computationally feasible. The 
result is a remarkably effective and robust filter for re- 
jecting bright foregrounds and we demonstrate its effec- 
tiveness using realistic simulations of the radio sky and 
a simple fiducial interferometer configuration. 

In Sec. |2] we introduce the all- sky formalism that is 
the basis for this technique. In Sec |3] we discuss the 
map-making process in the spherical harmonic transit 
telescope paradigm. In Sec. [4] we discuss how to best 
represent statistics of the cosmological 21 cm signal and 
foregrounds in the measurement basis, and in Sec. [5] 
we discuss how the Karhunen-Loeve transform can be 
used to detect faint signals in the presence of bright fore- 
grounds. In Sec. [6] we quantify the information lost due 
to foreground removal using Fisher Analysis, and esti- 
mate errors on power spectra. We conclude in Sec. [7| In 
Appendix |A] we discuss the signal and foreground models 
we employ. In Appendix [B] we describe how we create 
realistic simulations of radio emission. 

2. FORMALISM 

In this section we introduce the m-mode formalism, a 
new description of the measurement process for transit 
interferometers. 

In radio interferometry a visibility Vij is the instan- 
taneous correlation between two feeds Fi and Fj. We 
will assume that we can take a linear combination of the 
signal from a dual polarisation antenna, with no cross- 
polarisation or polarisation leakage, such that we are sen- 
sitive only to the total intensity (Stokes /) part of the 
sky. The fully polarised extension to this work is also a 
trac table problem, we a ddress this in a subsequent pa- 



per, Shaw et al. (2013). At any instant, a visibility is 
given by 
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where Uij = (r^ — rj)/X is the spatial separation be- 
tween the two feeds divided by the observed wavelength 
(that is the separation in the iX'U-plane), n is the posi- 
tion on the celestial sphere, and Ai{n) gives the primary 
beam of feed i. In the above we have normalised our 
visibilities such that they are temperature like, and we 
have defined them in terms of the brightness tempera- 
ture T = X^I /2k}) instead of the total intensity /. The 
quantity Vtij = is the geometric mean of the in- 

dividual beam solid angles 
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which also gives the effective antenna area Aq^^VL = A^. 
This ensures that for a sky with uniform brightness tem- 
perature T the auto-correlation of an antenna Vu = T 
with our definition. 

As the Earth turns both the primary beams and 
the baseline separations rotate relative to the celestial 
sphere. This means the measured visibilities change pe- 
riodically with the sidereal day. We take this into account 
by explicitly including the dependence on the azimuthal 
angle (j) and by averaging over each sidereal day. 

The measured visibilities are also corrupted by instru- 
mental noise for which we add a noise term n^j(^). We 
assume the noise is stationar y such that its statistics are 

Equation (l)| in terms of a 



independent of <p. Rewriting 



transfer function leaves i. 



le measured visibility as 
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;((/)) = Jd^h Bij (n; (/))T(n) + n,, ((/>) (3) 



where the transfer function is 
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B^J{n; (/>) = ^A{n; (/>)A*(n; ^)e^-^^-^^M) . (4) 

Taking advantage of the periodicity in ^, we Fourier 
transform the system 



-imcf) 



(5) 
(6) 



where to proceed to the second hne we have inserted the 
spherical harmonic expansions of both the sky, and the 
beam transfer function 

T{n) = J2aimYiUn) , (7) 

Im 

B,,in;<P) = J2BiLim*min). (8) 



Note that we have defined B^^ relative to the conjugate 
spherical harmonic in order to simplify later notation. 
As the (j) dependence simply rotates the functions about 
the Earth's polar axis, the transfer function at any (j) is 
trivially ^^^^(0) = 5;^(^ = 0)e^^^. Combined with the 
exponential factor in the integral, this simply generates 
the Kronecker delta Smm' , and we find 



(9) 
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This gives a simple description of how the observed 
sky maps into the measured data given a telescope de- 
sign (which is contained in the beam transfer matrices 
Bi^l^). This transformation does not mix m- modes on 
the sky, and can therefore be performed on an m-by-m 
basis — for any particular m and frequency u the mea- 
sured visibilities are simply a projection of the /-modes 
on the sky for the measured m. As the optical system is 
of a finite size, this limits both the / and m to which the 
telescope is sensitive, ensuring we only need to consider a 
finite number of degrees of freedom, both measured (Vm) 
and on the sky {aim)- 

In fact whilst the positive and negative m-modes may 
be independent measurements they are still observations 
of the same sky — by transforming the conjugate Vl^ 
and using that aim = cl'i -m a real field we see that 



(10) 



In light of this we will change our notation such that 
we are considering only the actual degrees of freedom on 
the sky. Let us separate out the positive and negative m 
parts by defining 



Im 
Im 
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(12) 



which is valid for m > 0. Additionally to prevent double 
counting the m 



measurement we need t o set B^^ 



Tin 



0. This gives a modified version of Equation (14) 
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For brevity of notation, we will introduce a label a which 
indexes both the positive and negative m parts of all 
included feed pairs zj, such that any particular a specifies 
exactly the values of zj, ± (exactly how a is packed is 
unimportant). This gives 



air. 



(14) 



The beam transfer matrices above can be written in an 
explicit matrix notation 
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where the row index labels all combinations of baseline 
(a) and frequency {v)^ whereas the column index is over 
all multipole (/) and frequencies {v'). Similarly we can 
define vectors for the visibilities and harmonic coeffi- 
cients 



(dm) 
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From here onwards we'll drop the subscript m denoting 
the spherical harmonic order, all the equa tions below are 
valid for any m. This allows us to rewrite Equation (14)| 

as 

v = Ba^n. (17) 

This simple linear description of the measurement pro- 
cess of a transit telescope is extremely powerful. By 
reducing it down to a linear mapping between a finite 



number of degrees of freedom it allows us to apply the 
standard tools of signal processing. In the subsequent 
sections we apply it to solve two challenging problems in 
21 cm radio astronomy. 

3. MAP-MAKING 

In astronomy being able to transform our measured sig- 
nal into an accurate map of the sky is essential. Whilst 
in this paper we explicitly avoid this process for our 
analysis, preferring to carry it out directly in the data 
space, maps are still needed for visualisation and cross- 
checking. Map-making with interferometric data is gen- 
erally a complicat ed process p erformed by algorithms 
such as CLEAN ( Hogbom|1974| ) and its derivatives. This 
is especially true with wide fields of view where mosaic- 
ing and K;-projection are generally required. However, 
the m-mode formalism makes the map-making process 
on the full sky conceptually simple. 

First, we assume that the instrumental noise n fol- 
lows a complex gaussian distribution with covariance 
N = (nn)^^ and the different frequency channels are 
independent. For stationary noise, the m-modes are un- 
correlated and the likelihood function of the observed sky 
for a single m and frequency v is 



p{v\a) 
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exp(^-(t;-Ba)'^N"^(i;-Ba)) (18) 



where the vector a contains all harmonic coefficients for 
the given m. 

To estimate the sky corresponding to a given set of 
visibilities we will look for a maximum likelihood solution 
dp/da = 0. In particular we want to find the value of a 
that minimises 
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N 2^ 



(19) 



The matrix N 2 represents any factorisation such that 
(N~2)tN-2 = N~"^. Provided N contains no noise- 
less modes, it is positive-definite and so this factorisation 
should exist. The maximum likelihood solution is given 
by the Moore- Penrose pseudo-inverse 



(^N"^B^^ 



N" 



(20) 



where the superscript + denotes the pseudo-inverse. 

Depending on the number of baselines measured and 
the maximum / we are sensitive to, the problem may ei- 
ther be over- or under-constrained. In either regime the 
Moore- Penrose pseudo-inverse gives a solution, in the for- 
mer case this reduces to the standard map making equa- 
tion d = (B'''N"^B)"^B''"N"^i;, and in the latter case 

selects the solution which also minimises |d|^, effectively 
setting unconstrained degrees of freedom to zero. 

As both distinct frequencies and m-modes are indepen- 
dent, map- making for a set of full sky observations is a 
case of collating the estimates for each individual v and 



For details see |http : //en . wikipedia . org/wikT7| 
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4. TWO POINT STATISTICS 

For Intensity Mapping experiments, our data has three 
components: the 21cm signal which we are ultimately 
trying to extract, the foregrounds, and instrumental 
noise. Understanding the 2-point statistics of the data 
is of paramount importance to our analysis — not only 
do the correlations of the signal encode most of the 
cosmological information that we are interested in (see 
Appendix [A|), but to efficiently extract this we require 
knowledge of the 2-point statistics of all three compo- 
nents. Here we write down the linear relationship be- 
tween these 2-point statistics of the data, and how they 
are related to the underlying physical correlations. 

The statistics of instrumental noise live in the visibility 
space, the basis of our measurements. However the other 
components are naturally represented o n the sky, and 



must be projected into this space using Equation (14) 
The lowest non-zero moment of the visibilities is their 



covariance 



c, 



{avm)\{a' v' m') 



\ ^ au ^ ol' v' I 



(21) 



This is the covariance between all measured degrees of 
freedom: baselines, frequencies, and m-modes. For the 
experiments listed in Section [l] we expect > 10^, ^ 10^ 
and 10^ respectively. This gives matrices of dimension 
> 10^, too large to be tackled with current technology, 
both in terms of computation and storage. 

Instead, let us make an approximation that will dra- 
matically reduce this complexity. If we think of the sky 
as a statistically isotropic random field, its two point 
statistics become dramatically simpler 



(22) 



and importantly, they are automatically uncorrelated in 
the m index. T his means that the full signal covariance 
Equation (21) is block diagonal and thus allows us to 
calculate all statistics on an m-by-m basis. For a specific 
m-mode 



(23) 

where we have dropped all the m-indices, and N is the 
power spectrum on the instrumental noise. As the num- 
ber of m-modes we are sensitive to is usually > 10^, 
assuming statistical isotropy saves at least a factor of a 
million in computation and a thousand in storage. This 
ability to efficiently perform calculations incorporating 
the full statistics opens up new avenues for the data anal- 
ysis of transit instruments. Synchrotron emission from 
our galaxy clearly violates this assumption of statistical 
isotropy, though, as we will demonstrate, this does not 
appear to affect our analysis and in particular our ability 
to clean foregrounds. 
In matrix notation 

C = BCskyB^ + N . (24) 

where we will split Cgky into independent 21 cm signal 
and foreground parts Cgky = C21 + C/. 



The statistical models used for each component are 
chosen to be appropriate for the frequency ranges of in- 
terest. In the fiducial example that follows this is 400- 
600 MHz, corresponding to z ^ 1-2 for the cosmological 
signal. These are described in Appendix [A) 

5. FOREGROUND REMOVAL WITH THE 
KARHUNEN-LOEVE TRANSFORM 

To clean our data we simply aim to find a subset within 
which there is significantly more 21cm signal than the 
astrophysical foregrounds. However, in the presence of 
mode mixing there is no immediately apparent represen- 
tation in which to perform this. This basis can be found 
using the Karhunen-Loeve transform (often called the 
Signal-to-Noise eigendeco mposition) , which has a lon^ 



history in Cosmology (e.g. |Bond||1995[ [Vogeley fc Szalay 
3d tor the analogous prob lem of 
E/B m ode sepa ration for pol arisation of the CMB ( Lewis 



19961) and has been used 
. ^ ^P 

et al][2002; Bunn et al.|l2003). This transform simul- 



taneously diagonalises both the signal and foreground 
covariance matrices, generating a set of modes with no 
foreground or signal correlations. This makes comparing 
the amount of signal and foreground power in each mode 
trivial. 

To reduce the risk of foreground uncertainties biasing 
our analysis, we will prioritise the removal of foreground 
contaminated modes at the expense of cosmological sig- 
nal. In contrast, the instrumental noise is well under- 
stood, and we should be able to dig deeper into this con- 
taminant to extract useful cosmological information with 
little risk. In practice, this means we will start with a 
filter which aggressively removes foregrounds only; sub- 
sequently we will add back in the instrumental noise, 
which will allow us to compress the data by removing 
completely noise-dominated modes, while retaining those 
with a small fraction of signal. 

This requires models for the statistics of both the signal 
and the foregrounds. The signal is modelled as a simple 
gaussian random field for the 21 cm emission, whereas the 
foreground model includes both the synchrotron emission 
from our galaxy, and the contribution from a background 
of extragalactic point sources. The details of both are 
discussed in Appendix [A} 

The Karhunen-Loeve transform seeks to find a linear 
transformation of the data v' = Pv such that the 21 cm 
signal S = BC2iB^ and foreground F = BC/B^ covari- 
ance matrices are jointly diagonalised. That is 



and 



S ^ = PSP^ = A , 
F ^ = PFP^ = I , 



(25) 
(26) 



where A is a diagonal matrix, and I is the identity. In 
this diagonal basis we can simply compare the amount of 
power expected in each mode by the ratio of the diagonal 
elements (this is given by the corresponding entries of A), 
and identify the regions of the space with low foreground 
contamination (large entries in A). 

This transformation can be found by solving the gen- 
eralised eigenvalue problem Sx = XFx. This gives a 
set of eigenvectors ic, and corresponding eigenvalues A. 
Writing the eigenvectors in a matrix P, row- wise, gives 
the transformation matrix to diagonalise the covariances. 



The eigenvalues A corresponding to each eigenvector give 
the diagonal matrix A 

To isolate the 21 cm signal, we want select modes with 
eigenvalue (signal- to- foregro und power) greater than 
some threshold (see Figure 1). To project into this basis 
we define the matrix Pg which contains only the rows 
from P corresponding to eigenvalues greater than the 
threshold s. 

For most analysis we can work directly in the eigenba- 
sis. However, for visualising our results, we want to be 
able to transform back to the sky (by way of the mea- 
sured visibilities). To project back into the higher dimen- 
sional space we simply generate the full inverse P~^ and 
remove columns corresponding to the rejected modes (we 
denote this matrix P^). This is equivalent to projecting 
into the full eigenbasis, zeroing the foreground contami- 
nated modes, and then using the full-inverse P~^. 

For further analysis, we must include all noise terms, 
both foregrounds and instrumental. Writing the total 
noise contribution as Naii = F + N, the matrix in the 
truncated basis is 



Signal /Foreground Ratio — log^g (S/F) 



= I + p.,np! . 



(27) 

(28) 

As the transformed instrumental noise matrix will not re- 
main diagonal this gives a correlated component between 
all our modes. However, as it is useful if our modes are 
uncorrelated we make a further KL-transformation on 
the foreground removed signal = A^, and total noise 
N^^^ covariance matrices. For computational and stor- 
age efficiency we apply a further cut-off to include only 
modes with a signal to total-noise ratio greater than cut- 
off value t. We denote this projection matrix Q. For 
notational convenience we will write the total transfor- 
mation in terms of a single matrix R = Q^Pg, having 
chosen suitable values for the two cut-offs s and t. As 
above, we will define an inverse R = P^Q^ which remains 
orthogonal to the removed space. Quantities in this fi- 
nal basis we denote with tildes, for example a visibility 
mapped into this basis is i; = Rv, and a covariance is 
C = RCR^ 

5.1. Cylinder Example 

While this method works with any transit telescope, 
to illustrate the foreground removal process we will sim- 
ulate a cylinder telescope, s uch as CH IME or the Pitts- 
burgh Cylinder Telescope ( Bandura| [2Qll ). These are 
transit interferometers composed of multiple parabolic 
cylinders where each only focuses in the East- West di- 
rection. This gives a long and and thin primary beam on 
the sky, extending nearly from horizon to horizon in the 
North- South direction but which is only around 1 degree 
wide East- West. 

Feeds are spaced along the axis of each cylinder — 
when correlated these provide resolution in the N-S di- 
rection. Correlations between cylinders enhance the E-W 
resolution. The telescope operates as a transit telescope 
such that the entire visible sky is observed once per side- 
real day. 

For a cylinder uniformly illuminated by a particular 
feed, near the axis the beam pattern is a sine function 
in the E-W direction, and uniform in the N-S direction 
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Figure 1. The Signal-to- Foreground spectrum for all m-modes. 
We have plotted log^o Km, where for each m the eigenvalues have 
been sorted by in ascending order (thus there is no physical inter- 
pretation to the vertical direction). The contours are drawn at —4, 
-2, 0, 2 and 4. 



(Wilson et al.|2009| chapter 6). To extend this off- axis 



we modulate by projected area of the telescope giving 

W 



A (n) = sine ( tt n • u— ] Q {n ■ z) h ■ z 



(29) 



where W is the cylinder width and z is a unit vector 
pointing to the zenith and n is a unit vector pointing 
East in the ground-plane. The step function 9 masks 
out the regions where the sky is below the horizon, and 
the final factor n ■ z accounts for the projected area of 
the telescope. 

We model a two cylinder telescope observing the sky 
with 64 frequency channels from 400-600 MHz. Each 
cylinder is 15 m wide and has 60 feeds regularly spaced by 
0.25 m (with the feeds lining up E-W between cylinders). 
The telescope is located at a latitude of 45°. These speci- 
fications correspond to a slightly smaller half-bandwidth 
version of the CHIME pathfinder telescope being con- 
structed at DRAG. 

The noise covariance is diagonal for all m, frequencies 
and baselines. For small m-modes with m <C 1/ (27rA^) 
(where Acj) is the angular integration time), the noise 
variance is 



^sys,i (^)^sys,j (^) 
47rA^day4idAi^ 



(30) 



where Tsys,i is the system temperature of a single po- 
larisation of feed i, A/'day is the number of sidereal days 
observed, tgid is the length of a sidereal day, and Au is 
the width of the frequency channel. As we combine the 
two polarisations into a single unpolarised signal this re- 
duces the noise power spectrum by a factor of two. For 
this example T^ys = 50 K, and we assume two full years 
of ob servation ( that is 730 complete sidereal days). 
In [Figure 7] we show the spectrum of Signal-to- 



Foreground eigenvalues for the telescope. The KL mode 
distribution of S/F has an extremely rapidly rising spec- 
trum so that the information retained (approximately 
the number of modes) is rather insensitive to the cut 
threshold s for values between 10~^-10^. 
To demonstrate the foreground removal process we 
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0.64 640.00 0.0000001 430.0000000 -0.000023 0.000026 




-0.0012 0.0011 -0.00071 0.00071 -0.00042 0.00046 



Figure 2. This plot illustrates the foreground removal process in action on simulations of the foregrounds-only (top row) and signal-only 
(bottom row). Each plot has two elements, an image of the 400 MHz frequency slice on top, and beneath, a cut through the celestial equator 
(from 270-300 degrees) showing the frequency axis. The left-most column shows the original simulations on the sky. The band appearing 
in the foreground frequency slice is the galactic plane. The middle column shows the maximum likelihood map that we would make from 
the measured visibilities without subtracting the low S/F modes. The maps are blank below 6 = —45° because this area is always below 
the horizon for the telescope at a latitude of 45°. The final column shows the maps made after the foreground removal process (in this case 
we have discarded modes with S/F < 10). This leaves a clear correspondence between the original signal simulation and the foreground 
subtracted signal, whilst leaving the foreground residuals over 20 times smaller in amplitude. 



simulate time-streams from separate realisa tions of the 
signal and foregrounds using Equation (17)[ and project 
them through the filtering process to make maps. The 
visibilities are filtered using ydean = RRt^, and then are 
turned into a 3D map using Equation (20) | In Figure 2 



we show the original simulation, the map niade from the 
unfiltered visibilities, and the map made from the fore- 
ground filtered visibilities. The simulated signal and fore- 
ground maps are described in AppendixjB] Note that the 
foreground maps are not simply realisations of the model 
used to generate the foreground filter — unlike the input 



model they are both non- gaussian and anisotropic. |Fig 
ure 2 1 clearly illustrates how the foreground amplitude 



is dramatically reduced by the process, whilst the sig- 
nal retains its overall character. Though the foreground 
residuals are clearly highest in the galactic centre, even 
these are significantly lower than the filtered signal. 



6. FISHER ANALYSIS 

In the previous section we have demonstrated that the 
Karhunen-Loeve transform gives an effective method for 
removing foregrounds. Though a visual inspection of 
[Figure 2 suggests that the 21cm signal is largely un- 
touched, we would like to be able to quantify how much 
useful information remains. In this section we will use 



the Fisher matrix (see [Dodelson 2003 chap. 11 for an 
overview) to forecast power spectrum errors, for the same 
telescope, with and without foreground removal. 

After projection into the reduced eigenbasis, let us as- 
sume that the remaining modes follow a complex gaus- 
sian distribution with zero mean. This assumption 
should be reasonable provided we have successfully re- 
moved the modes containing any significant foreground 
contribution. In this case the Fisher Information matrix 
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Figure 3. 21cm Intensity Mapping provides a powerful tech- 
nique for measuring the shape of the matter power spectrum. In 
the plots above we illustrate the power spectrum constraints that 
could be achieved with the large cylinder telescope. The top plot 
shows the constraints on the whole power spectrum, the lower plot 
zooms in on the region with the Baryon Acoustic Oscillations, di- 
viding through by a smoothed spectrum to remove the general 
trend. The dark shaded bands are the errors we would find with- 
out foregrounds, where the only noise is instrumental, the light 
bands include both contributions. 



P{k) = J2PaPa{k) ■ 



(33) 



In Appendix [A] we describe how to project this quantity 
into the angular power spectrum of 21cm fluctuations 
that we use to calculate the visibility correlations. For 
simplicity, each of our bands is simply equal to the input 
power spectrum within a fixed /c-band, and zero outside, 
such that the fiducial model is Pa = I. 

For the band-powers Pa that we are trying to estimate, 
the matrices are simply the projection of the basis 
functions Pa{k) into the eigenbasis. Starting from the 
angular power spectra Ca-iiv^v') c orresponding to each 
of the basis functions Pa{k) (using Equation (A2) ) 



RBCoi .B^Rt . 



(34) 



In practice explicitly calculating the this way is 
computationally expensive, we instead use a Monte- 
Carlo technique. We can form the estimator qa = 

x^Q CflC which has t he property that its covari - 

ance {q 

aQb) {Qa) {Qb) — ^ab ( [Padmanabhan et al.|2QQ3[ ). 
This means we can estimate th e Fg^ by averaging over 
reali sations of x. For details see Dillon et al. (2012). 

In |Figure~3] we plot the power spectrum errors tor two 
cases^ in the presence of foregrounds that have been 
cleaned using our method and without foregrounds at all. 
In the case without foregrounds, F = and we only per- 
form the final Karhunen-Loeve transform to diagonalise 
the signal and instrumental noise. For the foregrounds 
we have cleaned modes with S/F < 10 and additionally 
have removed modes with a small ratio of signal to total 
power. This corresponds to setting 5 = 10 and t = 0.01. 
This is a clear demonstration of the effectiveness of the 
technique — it reduces our sensitivity on large scales as 
we would expect (as the removed foreground are smooth 
on large scales), while only slightly reducing our ability 
to constrain the small scale power spectrum. 



for a set of parameters Pa is 

F^-^ = tr (C,C"'C5C"' 



(31) 



where = dC/dpa- Though in the constructed eigen- 
basis C = A + I is diagonal, can have off-diagonal 
elements. Again this process is performed on a per-m 
basis. As there is no coupling between them, the total 
Fisher Information is simply the sum over all m-modes 



Fab 



(32) 



For a set of parameters Pa that we are trying to deter- 
mine, the inverse of the Fisher matrix is the lowest order 
approximation to their covariance. 

In this work we will focus on forecasting the errors 
on the shape of the matter power spectrum P{k) whilst 
keeping all other cosmological parameters fixed. Such 



forecasting has b een performed using the li^'-plane in Seo 
|et al.| (|2010) and'Ansari e t al.| ( [2QT2p . 

We parametrise the power spectrum in terms of a linear 



7. CONCLUSION 

In this paper we have introduced a powerful formalism 
for describing the measurement process of transit tele- 
scopes (either interferometric or otherwise). It is a natu- 
ral formalism to describe interferometry on the full sky — 
sidestepping the standard complications that arise when 
dealing with wide field interferometric data such as mo- 
saicing and iL'-projection. A spherical harmonic transit 
telescope allows for compact and computationally effi- 
cient representations of the data and its statistics, which 
enable new ways of approaching important problems like 
map-making and foreground removal. 

Using the m-mode formalism and approximating the 
foregrounds as statistically isotropic allows the power- 
ful Karhunen-Loeve transformation to be used, auto- 
matically finding the basis in which the astrophysical 
foregrounds and 21 cm signal are maximally separated. 
The KL approach would be computationally impossi- 
ble otherwise and is a key advantage of the m-mode 
formalism. Using this technique we can take the full 
three-dimensional dataset into account and overcome 
the mode-mixing problem. The filters we construct are 
highly effective and robust, a fact we have demonstrated 
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by propagating through reahsticahy simulated 21 cm and 
foreground timestreams. In our fiducial example, shown 
in [Figure 2} peak-to-peak foreground amplitude was re- 
duce by a factor of ~ 2 x 10^ leaving the peak-to-peak 
amplitude of the 21 cm signal around 20 times brighter 
that the foreground residuals. 

We have also used this formalism to produce realis- 
tic forecasts for the power spectrum constraints from a 
fiducial 21 cm cylinder interferometer. We have demon- 
strated that foreground cleaning does not significantly 
degrade 21 cm power spectrum estimates on BAO scales 
and below compared to a hypothetical foreground-free 
measurement. We anticipate that the spherical harmonic 
transit telescope formalism will be a powerful tool that 
can be applied to inform experimental design and test 
the interplay between real-world systematics and design 
constraints on twenty-firs t century 21 cm sc ience. We 



will explore this further in Shaw et al. (2013) 
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APPENDIX 

A. SIGNAL AND FOREGROUND MODELS 

We model the 21 cm signal and foregrounds as isotropic 
fields described by an angular power spect rum z^'). 
We base our models on Santos et al. (2005), although we 
will only include the galactic synchrotron and extragalac- 
tic point source contributions. Both these contributions 
are assumed to take the form 



Ci{v,v') = A 



I 

Too 



(Al) 



As the models are calibrated for observations of the 
reionisation epoch, we need to transform them into the 
higher frequencies we are concerned with. We list the 



parameters for both these models in Table 1 



For t he point source model, wh ich is based on the re- 
sults of|Di Matteo et al.l (2002), we change the pivot 
fflz 



frequency vq from 150 MHz to 408 MHz and also rescale 
the amplitude in order to raise the maximum fiux of un- 
subtracted sources from 0.1 mJy to 0.1 Jy. 

The galactic synchrotron model we use is not only cali- 
brated for low frequencies but also high galactic latitudes. 
As we will measure large fractions of the sky we take this 
into account by changing the A and angular power-law 



Table 1 

Our model for the angular pow er spectrum Ci(y^ u') is based on 

those of [Santos et al.| ( [2Q05| however we have adapted the 
parameters to better suit the lull-sky intensity mapping regime 
we are interested in. 





A (K2) 


a 




c 


Galaxy 


6.6 X 10-^ 


2.80 


2.8 


4.0 


Point Sources 


3.55 X 10-4 


2.10 


1.1 


1.0 



index /3 to be consistent with the angular power spec- 
trum of the 408 MHz Haslam map for galactic latitudes 
\h\ > 5° from |La Porta et al | ( [20081 ). 

We model the 21cm brightness temperature as being 
a biased tracer of the underlying matter fiuctuations. 
These fiuctuations are natually characterised by t he an- 
gular power spectrum (Lewis 



Challinor 2007] [Datta 
et al.p OOT). However exact calculation of this quantity 
requires double-integration over highly oscillatory func- 
tions, instead we use the fiat-sky approximation from 
Datta et al.|(|2007D 



1 



dk\\ COS {k\\ Ax) Pn{k;z,z') (A2) 



where x and x^ are the comoving distances to redshift z 
and z^ Their difference is denoted by Ax = X~X^' The 
vector k has the components k\\ and //x in the directions 
parallel and perpendicular to the line of sight (x is the 
mean of x and xO- T his approximation is accurate to 
the 1% level for / > 10 (Datta et al. 2007). 

We model the 21cm brightness temperature power 
spectrum Pt^ as 

PT.ik; z, z') = n{z)n{z') {b + f^^fPm{k- z, z') (A3) 

where Pm(k\z^z') = P{k)Dj^{z)Dj^{z') is the real-space 
matter power spectrum, Dj^{z) is the growth factor nor- 
malised such that l^+(0) = 1, 6 is the bias, and the 
growth rate / = din D^/dln a, the logarithmic deriva- 
tive of the growth factor Dj^. We assume that the bias 
6 = 1 at all redshifts. The mean brightness temperature 
is assumed to take the form 



n{z) = 0.3 



HI 



10- 



^71 



(1 



0.29 



-1/2 



2.5 



1/2 



mK (A4) 



given in Chang et al. (2008). We assume that the neutral 
hydrogen fraction takes a value Q.m = 5 x 10~^ (Masui 
et al.,2013). 

B. SIMULATING ALL-SKY RADIO EMISSION 

B.l. Galactic Synchrotron 

In order to test our methods we require simulated maps 
of the Galactic emission from our own galaxy in the range 
400-1400 MHz with 1 MHz resolution. Though there are 
maps at both 800 MHz and 1420 MHz, the only public 
all-sky rad io survey in this range is the 408 MHz Haslam 



map (Has lam et al. 1982[). Howev er, the Global Sky 



Model ( |de Uliveira- Costa et al.| 2008 ) is based on a com- 
pilation oTlnapsTfonrTIJTyiTI^ 94 GHz. We use the 
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Global Sky Model to generate maps at both 400 MHz 
and 1420 MHz, and use these to estimate an effective 
spectral at each location on the sky 



a{n) 



lQgTi42o(n) - logT4oo(n) 
log 1420 -log 400 



(Bl) 



By combining this with the Haslam mapp^ we can ex- 
trapolate to simulate a map of the sky at any desired set 
of frequencies 



Tha.se{n,l^) = T408(n) 



408 MHz 



(B2) 



Unfortunately this map lacks both the small scale angu- 
lar fluctuations (because of the limited resolution of the 
Haslam map) and any spectral variations (because of the 
power law extrapolation) that would be present on the 
real sky. It is essential to include these to accurately test 
any foreground removal method. 

To include t hese fluctuations we could add gaussian 
realisations of Equation (Al) (with the galactic syn- 
chrotron parameters, see 'lable 1 ) to the base map, which 
contain frequency and angular lluctuations at arbitrary 
resolutions. However the Haslam map already constrains 
what the sky looks like on scales ^1°, and the extrap- 
olation with the spectral index map, is a constraint on 
the sky at 1420MHz (on scales larger than 5.1°, the res- 
olution of the Global Sky Model). Therefore we would 
like the combined simulated map to be consistent with 
these observations. We can do this by constraining the 
realisations to ensure there are no fluctuations on the 
scales constrained. In practice we do this by manipulat- 
ing the amplitudes of the tw o hig hest valued eigenmodes 
of Ci(v^v') (from Equation Al) in each realisation, to 
ensure that the 408 MHz and 1420 MHz slices are zero 
when smoothed on 1° and 5.1° scales respectively. 

A further problem is that we know the amplitude of 
small scale fluctuations varies over the sky, however our 
realisations are statistically iso tropic. This is clearly 
demonstrated in the analysis of La Porta et al.| (^008 ) , 
which shows that the amplitude of the angular power 
spectrum traces the galatic structure. To reproduce this 
we use the RMS amplitude of fluctuations across the 
Haslam map in ~ 4° patches (corresponding to Healpix 
pixels with NSIDE = 16), to rescale the fluctuations. In 
particular this generates an angular power spectrum on 
the whole sky which is consistent with a single power- 
law even when crossing through the beam-scale of the 
Haslam map into the simulated fluctuations. We do not 
include variations of the power-law index of the angular 
power spectrum as there ap pears to be no s tructure to 
the small variations found in La Porta et al. (2008). 



B.2. Extra- Galactic Point Sources 

Our point source maps are constructed from two com- 
ponents, a population of bright point sources {S > 0.1 Jy 
at 151 MHz) simulated directly, and a background of 
dimmer unresolved point sources {S < O.lJy) modelled 
as a gaussian random field. 

The former is constructed d irectly by drawing from 
the point source distribution of Di Matteo et al. (2002), 



each sourced is modelled as having pure power law emis- 
sion with a random spectral index. The sources are 
distributed randomly over the sky. Very bright sources 
{S > 10 Jy) are assumed to have been subtracted such 
that their residuals are less than this threshold. 
The unresolved backffl ound is simulate d by drawing a 



gaussian realisation fro m [Equation (Al) with the point 
source model detailed in liable Tl 

B.3. 21cm Signal 

Simulations of the Cosmological 21cm emission are per- 
formed by drawing gaussian realisations from t he flat- 



sky angul ar power spectrum (calculated using Equa- 
tion (A2)D. 
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